Hands-on Exercise 7c: Analytical Mapping

Author

Goh Si Hui

Published

February 26, 2024

Modified

February 27, 2024

1 About this Exercise

In this exercise, we will learn how to plot analytical map such as rate map, percentile map and boxmap.

2 Getting Started

2.1 Installing and Loading R Packages

For this exercise, other than tmap, we will use the following packages:

  • tidyverse for tidying and wrangling data
  • sf for handling geospatial data

The code chunk below uses p_load() of pacman package to check if the abovementioned packages are installed in the computer. If they are, they will be launched in R. Otherwise, pacman will install the relevant packages before launching them.

Show the code
pacman::p_load(tidyverse, tmap, sf)

2.2 Importing the Data

For the purpose of this hands-on exercise, a prepared data set called NGA_wp.rds will be used. The data set is a polygon feature data.frame providing information on water point of Nigeria at the LGA level.

We will use read_rds() function to import the data into R.

Show the code
nga <- read_rds("data/rds/NGA_wp.rds")
glimpse(nga)
Rows: 774
Columns: 9
$ ADM2_EN          <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", …
$ ADM2_PCODE       <chr> "NG001001", "NG001002", "NG008001", "NG015001", "NG00…
$ ADM1_EN          <chr> "Abia", "Abia", "Borno", "Federal Capital Territory",…
$ ADM1_PCODE       <chr> "NG001", "NG001", "NG008", "NG015", "NG003", "NG011",…
$ geometry         <MULTIPOLYGON [m]> MULTIPOLYGON (((548795.5 11..., MULTIPOL…
$ total_wp         <int> 17, 71, 0, 57, 48, 233, 34, 119, 152, 66, 39, 135, 63…
$ wp_functional    <int> 7, 29, 0, 23, 23, 82, 16, 72, 79, 18, 25, 54, 28, 55,…
$ wp_nonfunctional <int> 9, 35, 0, 34, 25, 42, 15, 33, 62, 26, 13, 73, 35, 36,…
$ wp_unknown       <int> 1, 7, 0, 0, 0, 109, 3, 14, 11, 22, 1, 8, 0, 37, 88, 1…

3 Basic Choropleth Maps

In hands-on exercise 7a, we learnt how to plot choropleth maps. Let us plot the choropleth maps for the functional water points and the total number of water points using the following code chunk.

total <- tm_shape(nga) +
  tm_polygons("total_wp",
              palette = "Greens",
              style = "equal",
              lwd = 0.1,
              alpha = 1) +
  tm_layout(main.title = "Distribution of All Water Points",
            main.title.size = 1,
            legend.outside = FALSE)

total

functional <- tm_shape(nga) +
  tm_polygons("wp_functional",
              palette = "Blues",
              style = "equal",
              lwd = 0.1,
              alpha = 1) +
  tm_layout(main.title = "Distribution of Functional Water Points",
            main.title.size = 1,
            legend.outside = FALSE)

functional

We will use tmap_arrange() to put these two plots side by side

Show the code
tmap_arrange(total, functional, nrow = 1)

4 Choropleth Map for Rates

In much of our learnings, we saw the importance to visualise rates rather than counts of things. This is because water points are not equally distributed in space. Plotting count of things could misrepresent the severity of the issue. For example, if an area has 100 functional water points, it might seem high. But what if that same area has 900 non-functional water points? Hence, displaying rates on choropleth maps might be more useful for our analysis.

4.1 Deriving Proportion of Functional Water Points and Non-Functional Water Points

We will calculate the proportion of functional and non-functional water points in each LGA. In the following code chunk, mutate() from dplyr package is used to derive two fields, namely pct_functional and pct_nonfunctional. We also used round() to round the proportions into 2 decimal places.

nga <- nga %>%
  mutate(pct_functional = round(wp_functional/total_wp,2)) %>%
  mutate(pct_nonfunctional = round(wp_nonfunctional/total_wp,2))

glimpse(nga)
Rows: 774
Columns: 11
$ ADM2_EN           <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak",…
$ ADM2_PCODE        <chr> "NG001001", "NG001002", "NG008001", "NG015001", "NG0…
$ ADM1_EN           <chr> "Abia", "Abia", "Borno", "Federal Capital Territory"…
$ ADM1_PCODE        <chr> "NG001", "NG001", "NG008", "NG015", "NG003", "NG011"…
$ geometry          <MULTIPOLYGON [m]> MULTIPOLYGON (((548795.5 11..., MULTIPO…
$ total_wp          <int> 17, 71, 0, 57, 48, 233, 34, 119, 152, 66, 39, 135, 6…
$ wp_functional     <int> 7, 29, 0, 23, 23, 82, 16, 72, 79, 18, 25, 54, 28, 55…
$ wp_nonfunctional  <int> 9, 35, 0, 34, 25, 42, 15, 33, 62, 26, 13, 73, 35, 36…
$ wp_unknown        <int> 1, 7, 0, 0, 0, 109, 3, 14, 11, 22, 1, 8, 0, 37, 88, …
$ pct_functional    <dbl> 0.41, 0.41, NaN, 0.40, 0.48, 0.35, 0.47, 0.61, 0.52,…
$ pct_nonfunctional <dbl> 0.53, 0.49, NaN, 0.60, 0.52, 0.18, 0.44, 0.28, 0.41,…

4.2 Map of Rate

Now, we will plot a rate map based on the functional water points using the pct_functional values.

functional_rate <- tm_shape(nga) +
  tm_polygons("pct_functional",
              n = 10,
              palette = "Blues",
              style = "equal",
              lwd = 0.1,
              alpha = 1,
              legend.hist = TRUE) +
  tm_layout(main.title = "Rate Map of Functional Water Points by LGAs",
            main.title.size = 1,
            legend.outside = TRUE)

functional_rate

5 Extreme Value Maps

Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (Anselin 1994).

5.1 Percentile Map

The percentile map is a special type of quantile map with six specific categories: - 0-1%,1-10%, - 10-50%, - 50-90%, - 90-99%, and - 99-100%.

The corresponding breakpoints can be derived by means of the base R quantile command, passing an explicit vector of cumulative probabilities as c(0,.01,.1,.5,.9,.99,1). Note that the begin and endpoint need to be included.

5.1.1 Data Preparation

5.1.1.1 Step 1: Exclude records with NA using drop_na().

nga <- nga %>%
  drop_na()

5.1.1.2 Step 2: Creating customised classification and extracting the values

percent <- c(0, 0.01, 0.1, 0.5, 0.9, 0.99, 1)
var <- nga["pct_functional"] %>%
  st_set_geometry(NULL)

quantile(var[,1], percent)
  0%   1%  10%  50%  90%  99% 100% 
0.00 0.00 0.22 0.48 0.86 1.00 1.00 
Note

When variables are extracted from an sf data.frame, the geometry is extracted as well. For mapping and spatial manipulation, this is the expected behavior, but many base R functions cannot deal with the geometry. Specifically, the quantile() would give an error. As a result st_set_geometry(NULL) is used to drop geometry field.

5.1.1.3 Step 3: Creating get.var function

We will now write an R function as shown below to extract a variable (i.e. wp_nonfunctional) as a vector out of an sf data.frame.

The arguments are: - vname: variable name (as character, in quotes) - df: name of sf data frame

The return is: - v: vector with values (without a column name)

get.var <- function(vname, df) {
  v <- df[vname] %>%
    st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}
Why write functions?

Writing a function has three big advantages over using copy-and-paste:

  • You can give a function an evocative name that makes your code easier to understand.
  • As requirements change, you only need to update code in one place, instead of many.
  • You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).

Source: Chapter 19: Functions of R for Data Science.

5.1.1.4 Step 4: Percentile Mapping Function

Next, we will write a percentile mapping function by using the following code chunk.

percentmap <- function(vnam, df, legtitle=NA, mtitle="Percentile Map"){
  percent <- c(0,.01,.1,.5,.9,.99,1)
  var <- get.var(vnam, df)
  bperc <- quantile(var, percent)
  tm_shape(df) +
  tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,
             title=legtitle,
             breaks=bperc,
             palette="Blues",
          labels=c("< 1%", "1% - 10%", "10% - 50%", "50% - 90%", "90% - 99%", "> 99%"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("right","bottom"))
}

5.1.1.5 Step 5: Testing the Function

percentmap("total_wp", nga)

5.2 Box Map

A box map is an augmented quartile map, with an additional lower and upper category. When there are lower outliers, then the starting point for the breaks is the minimum value, and the second break is the lower fence. In contrast, when there are no lower outliers, then the starting point for the breaks will be the lower fence, and the second break is the minimum value (there will be no observations that fall in the interval between the lower fence and the minimum value).

ggplot(data = nga,
       aes(x = "",
           y = wp_nonfunctional)) +
  geom_boxplot()

  • Displaying summary statistics on a choropleth map by using the basic principles of boxplot.

  • To create a box map, a custom breaks specification will be used. However, there is a complication. The break points for the box map vary depending on whether lower or upper outliers are present.

5.2.1 Creating the boxbreaks function

The code chunk below is an R function that creating break points for a box map.

The arguments are: - v: vector with observations - mult: multiplier for IQR (default 1.5)

The returns is: - bb: vector with 7 break points compute quartile and fences

boxbreaks <- function(v,mult=1.5) {
  qv <- unname(quantile(v))
  iqr <- qv[4] - qv[2]
  upfence <- qv[4] + mult * iqr
  lofence <- qv[2] - mult * iqr
  # initialize break points vector
  bb <- vector(mode="numeric",length=7)
  # logic for lower and upper fences
  if (lofence < qv[1]) {  # no lower outliers
    bb[1] <- lofence
    bb[2] <- floor(qv[1])
  } else {
    bb[2] <- lofence
    bb[1] <- qv[1]
  }
  if (upfence > qv[5]) { # no upper outliers
    bb[7] <- upfence
    bb[6] <- ceiling(qv[5])
  } else {
    bb[6] <- upfence
    bb[7] <- qv[5]
  }
  bb[3:5] <- qv[2:4]
  return(bb)
}

5.2.2 Creating the get.var function

get.var <- function(vname,df) {
  v <- df[vname] %>% st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}

Let’s test the function we have created.

var <- get.var("wp_nonfunctional", nga) 
boxbreaks(var)
[1] -56.5   0.0  14.0  34.0  61.0 131.5 278.0

5.2.3 Boxmap Function

The code chunk below is an R function to create a box map. - arguments: - vnam: variable name (as character, in quotes) - df: simple features polygon layer - legtitle: legend title - mtitle: map title - mult: multiplier for IQR - returns: - a tmap-element (plots a map)

boxmap <- function(vnam, df, 
                   legtitle=NA,
                   mtitle="Box Map",
                   mult=1.5){
  var <- get.var(vnam,df)
  bb <- boxbreaks(var)
  tm_shape(df) +
    tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,title=legtitle,
             breaks=bb,
             palette="Blues",
          labels = c("lower outlier", 
                     "< 25%", 
                     "25% - 50%", 
                     "50% - 75%",
                     "> 75%", 
                     "upper outlier"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("left",
                               "top"))
}

Let us now plot the boxmap using the function created.

tmap_mode("view")
boxmap("wp_nonfunctional", nga)
tmap_mode("plot")